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ABSTRACT 

Improvements  in  networking  allow  for  increasingly  complex  collaboration  environments  with  regard  to  session  scale, 
range  of  shared  tasks,  and  distance  between  remote  parties.  Floor  control  protocols  add  an  access  discipline  to  such 
environments  that  allows  to  mitigate  race  conditions  on  shared  resources  and  throttle  media  transmission.  Primary 
causes  for  resource  competition  among  users  may  be  the  lack  of  mutual  awareness  and  formal  session  orchestration, 
or  network  and  host  limitations. 

Various,  often  proprietary  and  unscalable  solutions  for  floor  control  have  been  implemented  for  telemedicine, 
video  conferencing,  or  distributed  interactive  simulation.  To  this  date,  an  analytic  comparison  of  the  efficacy  of 
these  solutions  is  lacking.  With  efficacy,  we  mean  the  proportion  of  time  that  a  protocol  takes  to  allocate  a  resource, 
accounting  for  social  and  technical  overhead  from  user  behavior,  protocol  cost,  and  network  conditions.  We  present  a 
novel  taxonomy  and  comparative  performance  analysis  of  known  classes  of  floor  control  protocols,  including  socially 
driven  protocols,  collision  sensing  on  shared  resources,  floor  token  passing  in  fully-connected  and  ring  topologies,  and, 
innovatively,  across  shared  control  trees.  Accordingly,  aggregated  and  selective  transmission  of  control  information 
over  a  multicast  control  tree  offers  the  best  scalability  and  efficacy.  A  novel  hierarchical  floor  control  protocol 
correlating  in  its  operation  with  tree-based  reliable  multicast  is  outlined. 
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1.  INTRODUCTION 

With  IP-multicasting^  more  powerful  collaborative  multimedia  applications  (CMA)  gradually  enter  mainstream 
computing.  Users  of  such  applications  can  overcome  their  separation  in  time  and  space  and  share  work  efforts  in 
real-time,  with  the  goal  of  approximating  the  quality  of  face-to-face  interactions.  While  earlier  CMA  were  proprietary, 
monolithic,  and  limited  in  media  modality  and  session  size,  new  applications  support  multi-party  and  multimodal 
collaboration  in  large  sessions,  such  as  distributed  interactive  simulations,  distance  learning  seminars,  or  special- 
purpose  MBone  sessions.^  However,  compared  to  advances  in  reliable  multicasting  and  multicast  routing,  there  has 
been  little  progress  with  regard  to  group  coordination  support  for  such  systems. 

Floor  control  is  an  access  discipline  for  CMA,  which  may  solve  flawed  telepresence  and  coordination  problems 
as  reported  by  Isaacs  and  Tang"^  in  studies  on  video  conferencing.  Typically  deployed  in  the  session  or  application 
layer,  floor  control  lets  users  attain  exclusive  control  over  a  shared  resource  by  attaining  a  floor,  which  is  a  short¬ 
lived  synchronization  primitive  for  multimedia  objects.  The  floor  semantics  is  generalized  to  multimedia  from  its 
traditional  notion  as  the  “right  to  speak”.®  Deployment  of  floor  control  in  CMA  may  be  complex  due  to  system 
capabilities,  in  terms  of  the  network,  host,  and  communication  subsystem,  as  well  as  user  behavior.  Early  work  on 
floor  controlled  systems  has  many  different  faces,  including  text-based  remote  collaboration,®  electronic  meeting 
support,^  distributed  teleconferencing,®  moderation  of  MBone  seminars,^  or  web-centric  groupwork.®  Users  may 
profit  from  floor  control,  because  it  defines  turn-taking  rules,  fosters  interactivity,  and  prohibits  unfairness.  From  a 
system  perspective,  it  can  provide  quality-of-service  input  regarding  admission  control  and  bandwidth  allocation  for 
bulky  media  streams,^®  or  serve  as  concurrency  control  for  synchronization  of  multiple  media  flows. 
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A  floor  control  methodology  is  characterized  by  several  dichotomies,  centralized  vs.  distributed  implementation,^^ 
mechanism  vs.  policy, explicit  vs.  implicit  floor  hand-over. 5  centralized  systems,  one  host  controls 
floor  assignment  for  an  entire  session,  which  is  easier  to  deploy,  but  suffers  from  overload  and  resilience  problems.  In 
distributed  systems,  each  host  in  a  session  contributes  to  the  global  control  process  as  much  as  its  own  resources  are 
concerned.  Consistency  management  and  efficiency  of  communication  are  major  concerns  with  this  approach.  Hybrid 
solutions  take  the  best  of  both  worlds  and  centralize  tracking  of  single  floors,  while  distributing  the  administrative 
load  over  the  hosts  in  a  collaborative  session.  Hybrid  solutions  are  also  well-suited  for  network  computers,  where  hosts 
have  limited  capabilities  and  communicate  with  each  other  via  session  servers.  A  mechanism,  e.g.,  activity  sensing  or 
token  passing,  is  the  physical  propagation  and  synchronization  apparatus  for  control  information.  It  is  instantiated 
with  policies  regarding  service  order,  priorities,  or  fairness,  such  as  “free-for-all” ,  “moderated” ,  “prioritized” ,  “voice- 
activated”,  or  “high-resolution”.  Policies  allow  to  adjust  floor  control  to  the  session  style,  e.g.,  a  lecture,  panel,  or 
laboratory.  The  service  order  of  queued  floor  requests,  or  the  signaling  on  how  the  floor  is  attained  has  also  been 
considered  a  policy. In  explicit  floor  passing,  a  user  must  submit  a  signal  specifically  to  attain  the  floor,  e.g.,  by 
raising  the  hand  or  pressing  a  button,  in  contrast  to  implicit  passing,  as  it  is  the  case  with  voice  activation. 

Despite  this  background  of  work  on  floor  control,  a  detailed  analysis  on  the  operational  principles  and  performance 
of  floor  control  protocols  is  still  missing.  This  paper  presents  a  novel  taxonomy  and  efficacy  analysis  of  floor  control 
protocols,  based  on  their  operational  principles.  Hierarchical  floor  control  is  described,  operating  on  a  control  tree, 
which  matches  the  backbone  reliable  multicast  tree  and  multicast  routing  tree.  The  goal  is  to  show  how  mechanisms 
for  group  coordination,  that  is  floor  control,  fit  in  with  the  emerging  large-scale  Internet  conferencing. 

In  Section  2  we  present  our  system  model  and  taxonomy,  as  the  foundation  for  our  comparative  efficacy  analysis 
presented  in  Section  3.  Section  4  outlines  the  operation  of  a  tree-based  floor  control  protocol.  The  paper  is  concluded 
in  Section  5. 

2.  SYSTEM  MODEL  AND  TAXONOMY 

We  define  important  terms  to  lay  the  foundation  for  a  common  methodology  and  taxonomy  of  floor  control  protocols. 
A  collaboration  environment  is  a  tuple  CE=  {S,L(,TZ,iF)  consisting  of  hosts  F  connected  by  network  links  E  CVxV, 
users  U,  shared  resources  TZ  and  floors  T.  Communication  among  hosts  is  solely  via  message  exchange,  and  not  in 
shared  memory.  Links  are  assumed  to  be  reliable  for  floor  control  messages.  A  collaborative  session  is  a  tuple  CS 
=  {sid.  A,  [/,  i?,  F),  instantiating  CE,  and  is  characterized  by  a  unique  session  identifier  sid,  session  duration  A,  users 
[/,  resources  R  and  their  corresponding  floors  F.  The  session  size  m  =  \U\  can  vary,  depending  on  whether  the  session 
is  open  or  closed,  and  in  our  terms  m  >  100  denotes  a  “large”  session.  Sessions  may  have  different  organizational 
styles,  such  as  lecture,  business  meeting,  panel  discussion,  or  hearing.  Users  u  £  U,  which  may  be  humans  or  system 
agents,  can  assume  the  control  roles  of  floor  originator  (FO),  owning  a  resource,  floor  coordinator  (FC),  moderating 
the  floor  of  a  resource,  or  floor  holder  (FH),  having  exclusive  access  to  a  resource.  Depending  on  the  session  style, 
FH  and  FO  may  be  identical.  Shared  resources  r  £  R,  e.g.,  a  robotic  device,  surgical  instrument,  video  and  audio 
channel,  or  graphical  data  object  in  the  user  interface,  can  be  replicated  or  located  at  a  specific  host.  Finally,  we 
define  a  floor  as  a  tuple 

/(r,  T,  FC,  FH,  ta,  S,  QoS)  =  st  £  F, 

where  r  denotes  the  resource,  T  is  the  media  type  of  r  (video,  audio,  graphics,  text),  FC  and  FH  identify  the  floor 
coordinator  and  holder,  ta  is  a  timestamp  marking  the  start  of  the  floor  holding  time  <5,  and  QoS  is  a  quality-of- 
service  directive  indicating  delivery  properties  for  r.  The  floor  state  st  =  [free  \  busy  \  idle  \  requested  \  nil]  records 
the  current  state  of  floor  f  at  a  host.  For  simplicity,  we  assume  that  there  is  one  user  per  host,  and  one  floor  /  per 
resource  r,  without  finer  granularity. 

Floor  control  protocols  rest  on  three  operational  pillars:  (1)  the  random  or  scheduled  access  to  resources,  which 
is  characterized  by  the  mechanism  and  host  topology,  relegating  directives  among  hosts;  (2)  the  centralized  or 
distributed  control  over  floors  in  the  session;  and  (3)  the  floor  policies  established  among  hosts.  These  properties 
are  reflected  in  the  taxonomy  in  Figure  1  (only  paradigms  indicated  with  solid  lines  are  analytically  compared.)  We 
divide  known  floor  control  paradigms  in  two  classes: 

Random-access  Group  Coordination  (RGC)  lets  users  contend  for  a  shared  resource,  either  mediated  by  social 
protocols,  or  by  remotely  sensing  its  status  before  or  during  resource  access.  Sensing  is  accomplished  either  by  users, 
tracking  each  other’s  activities  through  the  user  interface,  or  by  the  system,  sensing  the  state  of  an  application, 
host,  or  network  for  local  and  remote  activities.  The  global  floor  state  is  marked  with  assertions  on  local  variables 
and  no  token  entity  is  explicitly  exchanged  as  a  placeholder.  RGC  schemes  are  inherently  contention-based,  because 
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Figure  1.  Taxonomy  of  group  coordination. 

hosts  must  actively  compete  for  a  floor.  Remote  sensing  can  be  costly  and  inefficient.  In  RGC,  floor  acquisition  is 
characterized  by  “sensing”,  “busy”,  “active”,  and  “idle”  states. 

Scheduled  Group  Coordination  (SGC)  uses  floor  token  passing,  reservation  or  polling  for  pending  control  directives 
as  resource  access  mechanism.  The  token  is  a  unique  placeholder,  which  is  used  to  request,  deny,  reserve,  or  grant  a 
floor.  Regulated  floor  token  capture  disperses  race  conditions  on  resources.  Hosts  can  “ask”  for  the  floor,  or  a  token 
circulates  among  hosts  and  is  “offered”  to  hosts.  Token  tracking  and  ensuring  authenticity  and  consistency  can  be 
costly.  SGC  schemes  typically  operate  on  a  logical  host  topology  such  as  a  ring  or  tree.  In  SGC,  floor  acquisition 
is  characterized  by  “request”,  “deny”,  “grant”,  and  “release”  control  messages.  Predominant  session  geometries  are 
shown  in  Figure  2.  Resource  r,  in  the  shared  workspace  is  hosted  by  user  u,. 


(a)  (b)  (c)  (d)  (e) 


Figure  2.  Session  geometries. 

In  Fig.  2(a),  users  u,  attempt  to  gain  control  of  a  centralized  resource  r  with  limited  or  no  coordination.  In 
Fig.  2(b),  participants  u,  communicate  directly  with  one  another  to  attain  the  floor  in  the  most  direct  manner.  This 
scheme  is  also  suited  for  achieving  distributed  consensus,  e.g.,  with  voting.  Fig.  2(c)  depicts  ring-based  control,  based 
on  the  idea  of  passing  a  floor  token  in  a  linear  fashion  along  the  hosts  in  the  ring.  In  Fig.  2(d),  a  star-topology  is 
used  to  concentrate  floor  requests  onto  a  dedicated  FC  host,  which  is  a  subcase  of  Fig.  2(e).  In  this  tree  structure, 
control  directives  are  propagated  along  the  branches  of  a  multi-level  control  tree  towards  the  current  FH. 

3.  EFFICACY  OF  FLOOR  CONTROL  PROTOCOLS 

The  following  comparative  analysis  of  known  classes  of  floor  control  protocols  is  a  first  attempt  to  characterize  the 
efficacy  of  interactive  behavior  of  people  and  processes  from  a  resource  contention  perspective.  The  intention  is  not 
to  predict  exactly  how  a  protocol  performs  for  a  given  CE;  accomplishing  this  would  require  far  more  host-  and 
network-specific  details  and  statistics  on  user  behavior.  Our  goal  is  simply  to  assess  the  overhead  of  various  protocols 
with  regard  to  control  state  management.  The  basic  methodology  is  derived  from  multiaccess  communication,^® 
based  on  the  analogy  between  access  mitigation  for  the  data  channel  and  shared  resources. 

We  state  the  following  assumptions  to  make  the  analysis  tractable:  the  individual  processing  cost  for  control 
packets,  including  protocol  overhead  and  user-interface  specifics,  is  the  same  for  all  hosts;  control  message  delivery 
between  hosts  is  reliable  and  no  failures  in  hosts  or  the  network  occur;  information  from  a  host  reaches  any  other 
host  with  the  same  average  system-wide  propagation  delay;  and  the  interarrival  rate  of  floor  requests  is  Poisson, 
given  that  there  is  no  indication  for  cross-correlations  between  subsequent  floor  requests.  We  take  the  interarrival 
rate  of  floor  requests,  the  task  length,  and  the  network  propagation  delay  into  account.  An  important  aspect  of  our 


comparative  analysis  is  the  impact  that  multicasting  at  the  network  layer  has  on  the  efficacy  of  the  floor  control 
protocols.  When  network  multicasting  is  not  available,  the  user  hosts  are  forced  to  contact  one  another  explicitly, 
which  substantially  increases  the  processing  overhead  at  hosts  and  the  possibility  of  disagreement  on  which  host  has 
the  floor.  We  model  the  existence  of  multicasting  at  the  network  layer  by  assuming  that  a  user  host  needs  only  to 
pass  information  once  to  the  network  (during  an  activity  period  and  to  gain  the  floor)  for  the  information  to  reach 
all  other  hosts  with  the  same  average  delay.  The  notation  used  in  our  analysis  is  summarized  in  Table  1. 


/3i 

average  “think”  time  before  floor  token  arrival 

32 

average  “think”  time  at  floor  token  presence 

7 

average  processing  time  for  a  floor  directive 

d 

duration  of  average  activity  period 

ne 

processing  and  unicasting  overhead  for  n  hosts 

n 

efflcacy  of  a  floor  control  protocol 

G 

average  offered  floor  request  load 

L 

average  duration  of  idle  time 

A 

floor  request  interarrival  rate 

m 

average  number  of  hosts  in  session 

n 

average  number  of  active  hosts  in  session 

V 

average  vulnerability  period 

T 

average  propagation  delay 

Table  1.  Analysis  parameters. 


The  “think  time” ,  denoted  with  /3i  (the  time  until  the  expected  floor  arrives  at  the  local  host)  and  (32  (the  time 
once  the  floor  token  is  present  at  the  local  host  and  offered  to  the  user),  reflects  user  choices  to  grab  the  floor  in 
token  rings;  7  is  the  average  time  to  process  and  communicate  a  floor  directive,  including  packetization  delay,  generic 
queueing  transmission  times,  and  local  processing  overhead;  <5  is  the  duration  of  an  activity  period;  we  use  ne  >  t 
as  the  additional  delay  to  account  for  the  communication  and  processing  overhead  in  n  actively  collaborating  hosts; 
G  =  S  X  X  is  the  normalized  offered  request  load  on  floors,  including  new  and  previously  denied  and  resubmitted 
floor  requests;  l  sums  up  the  expected  idle  time  for  a  resource  during  and  after  floor  holding  time;  A  represents  the 
relative  frequency  of  demand  for  resource  access  and  thus  indicates  the  contention  level;  m  <  00  denotes  all  hosts 
in  a  collaborative  session;  n  <  to  is  the  number  of  active  hosts  in  the  multicast  group  for  the  current  floor  and 
transmission;  v  is  the  time  during  which  a  host’s  attempt  to  access  a  resource  can  be  intercepted  by  another  host; 
and  r  denotes  the  average  end-to-end  delay  between  hosts,  coalescing  multiple  routing  hops  into  one  (a  packet  must 
hence  traverse  on  the  average  the  same  number  of  hosts  on  the  path  from  the  sender  to  a  group  of  receivers.)  The 
activity  period  and  time  to  process  and  communicate  floor  directives  are  assumed  to  include  the  time  incurred  in 
providing  feedback  among  hosts. 
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Figure  3.  Turn  taking  periods  for  a  resource  and  three  hosts. 

A  turn  taking  model  reflecting  the  switching  of  control  over  a  resource  by  users  serves  as  the  blueprint  for  our 
analysis.  A  typical  communication  pattern  between  three  users  is  depicted  in  Figure  3.  Host  2  and  3  contend  here 


for  the  floor  from  host  1,  and  first  host  2  acquires  the  floor,  then  host  3.  A  turn  consists  accordingly  of  a  resource 
contention  time  X,  an  activity  (floor  holding)  time  A,  and  an  optional  idle  time  I.  Based  on  this  abstraction,  we 
define  the  efficacy  of  a  floor  control  protocol,  denoted  hy  t],  as  the  proportion  of  time  that  a  protocol  needs  to  allocate 
a  resource,  including  overhead  from  the  protocol  itself,  the  network,  and  user  behavior.  Formally,  the  efficacy  is  the 
ratio  of  the  average  floor  usage  time  U  vs.  the  overall  average  turn  length  T  =  X  +  A  +  I,  given  by  Eq.  1.  The 
average  contention  period  X  and  activity  period  A  together  are  called  the  average  busy  period,  B  =  X  +  A.  These 
turn  periods  serve  as  the  building  blocks  for  our  efficacy  analysis,  considering  both  point-to-point  and  broadcast  style 
communication. 


U  _  U 
T  ~  B  +  I 


(1) 


3.1.  Random-Access  Group  Coordination  Schemes 

Incoordinated  Social  Mediation  (RSI)  reflects  purely  random  resource  access  in  self-moderating  sessions.  There  is  no 
system  support  to  mediate  conflicts  and  ensure  system-wide  resource  consistency.  Assuming  that  all  information  is 
sent  reliably  to  all  user  hosts,  the  latency  introduced  by  the  system  can  lead  to  inconsistent  views  of  the  floors  at 
different  hosts  and  corrupt  user  cooperation.  When  this  occurs,  we  assume  that  all  users  involved  in  the  conflict 
must  restart  their  activities. 

Theorem  1.  The  efficacy  of  RSI  without  multicast  support  is 

=  A(<5  +  ne)e-2M‘5+-)  (2) 


T 


collides  with  collides  with 

start  of  A  end  of  A 


Figure  4.  Typical  RSI  timeline. 


Proof:  Figure  4  shows  a  prototypical  RSI  turn.  The  proof  is  the  same  as  for  medium  access  in  an  ALOHA 
channel.^®  In  the  point-to-point  model,  we  assume  that  n  hosts  are  actively  monitoring  each  other,  and  it  takes  an 
additional  time  ne  for  all  hosts  to  perceive  the  activity  from  a  given  host.  All  the  information  is  exchanged  reliably, 
and  the  average  vulnerability  interval  is  twice  the  total  of  the  task  length  and  the  added  overhead  incurred  in  serial 
communication  of  the  task  information  to  all  hosts  (i.e.,  <5  -l-ne),  because  messages  can  intercept  activities  any  time. 
Because  request  arrivals  are  Poisson,  the  probability  that  a  task  is  successful  is  .  The  success  probability 

times  the  number  of  arrivals  in  one  activity  period  results  in  Eq.  2.  □ 

Corollary  1.  The  efficacy  of  RSI  with  multicast  support  is 

=  XSe-^^^  (3) 

This  follows  under  the  assumption  that  every  update  to  and  from  a  host  requires  only  one  transmission. 


Social  Mediation  with  Feedback  (RSF)  assumes  that  users  cooperate  based  on  social  protocols.  Feedback  on 
remote  activities  is  gathered  through  the  user  interface.  If  a  user  contends  for  a  floor  and  perceives  remote  activity, 
she  would  back  off  for  a  random  period  and  attempt  to  reclaim  the  floor,  after  remote  activity  subsided.  RSF 
in  video  conferencing  is  often  realized  as  “voluntary  distributed  control”,^®  where  cooperative  users  switch  video 
streams  manually  on  and  off,  depending  on  whether  they  prefer  to  receive  or  send  specific  video  transmissions.  POTS 
conferencing  also  relies  on  RSF,  which  works  well  for  very  small  groups. 

Theorem  2.  The  efficacy  of  RSF  without  multicast  support  is 
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Figure  5.  Typical  RSF  timeline. 


Proof:  A  prototypical  timeline  of  RSF  is  depicted  in  Figure  5.  n  active  hosts  need  to  sense  and  process  remote 
activity  through  the  network  interface.  7'  is  the  user  time  to  remotely  sense  the  resource  state.  The  success 
probability,  denoted  as  Pg ,  equals  the  probability  that  no  activity  packet  arrives  in  an  average  vulnerability  period  v 
of  2(7'  +  ne)  sec.,  i.e.,  Pg  =  P[0  packets  in  v]  =  +«<;)  average  utilization  period  lasts  U  =  SPg,  and  the 

length  of  the  average  busy  period  B  =  X  +  Ais  determined  by  the  time  needed  to  handle  unsuccessful  floor  requests 
in  the  failed  contention  period  Xf  and  successful  requests  in  Xg,  with  =  (1  —  Pg)Xf  +  PgXg.  An  average  failed 
turn  attempt  consists  of  a  geometrically-distributed  indefinite  number  (L)  of  interarrival  times  of  floor  requests  with 
duration  /  sec  (average  time  between  failed  floor-request  arrivals),  plus  the  duration  observing  a  request  (7').  The 
values  for  L  and  /  have  been  derived  by  Takagi  and  Kleinrock.^ Substituting  our  notation  in  these  results,  we 
obtain  L  =  and  /  =  +"^)/(l  —  respectively.  Accordingly,  the  average 


time  of  a  failed  turn  attempt  equals  Xf 


A(7'+ne)(l— 


-I-  7'  -I-  ne  -I-  r.  An  average  successful  turn  lasts 


Xg  =  S  +  ^'  +ne  +  T.  Finally,  based  on  the  Poisson  assumption,  an  idle  period  consists  of  an  average  idle  time  interval 
plus  the  time  until  the  next  floor  request  arrives  on  the  average,  I  =  t  -I-  Substituting  into  Eq.  1,  we  obtain  Eq.  4. 


□ 


Corollary  2.  The  efficacy  of  RSF  with  multicast  support  is 


„MC  _ 
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With  multicast  support,  the  vulnerability  period  reduces  to  27',  and  e  becomes  negligible. 


In  Activity  Sensing  (RAS),^^d8  activities  on  shared  resources  are  monitored  by  a  background  process  at  the 
session  layer  (without  the  user  having  to  do  so),  in  order  to  sense  which  host  currently  operates  on  the  resource.  The 
RAS  concept  is  related  to  collision  sensing  on  a  multiaccess  channel.^®  In  principle,  no  changes  to  a  collaboration- 
unaware  application  are  required,  because  a  modified  X-server  would  intercept  and  filter  calls  to  shared  applications. 
The  RSE  system  agent  would  back  off  for  the  user,  when  perceiving  remote  activity,  or  allow  immediate  access  to 
the  resource  otherwise.  Distributed  activity  sensing  agents  collectively  monitor  resource  states  more  accurately  than 
humans  could. 

Theorem  3.  The  efficacy  of  RAS  without  multicast  support  is 
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Figure  6.  Typical  RAS  timeline. 


Proof:  The  timeline  is  shown  in  Figure  6.  7  is  the  system  time  to  sense  the  resource  state.  Without  multicast 
support,  a  host  has  to  send  floor  directives  individually  to  every  other  host,  which  according  to  our  assumption  takes 
7  -I-  ne.  Because  multiple  unicast  messages  are  used  by  each  host,  floor-state  inconsistencies  can  arise  during  the 
entire  time.  The  host  is  exchanging  floor  directives,  i.e.,  the  vulnerability  period  of  the  protocol  is  n  =  2(7  -|-  ne). 
Using  this  vulnerability  period,  Eq.  7  follows  using  the  same  approach  as  described  in  the  proof  of  Theorem  4. 


Theorem  4.  The  efficacy  of  RAS  with  multicast  support  is 


VRAS  r  ,  ill  At/  I  O  I  a  '  / 

0  +  T  +  J  +  +  2t  +  L) 

Proof:  The  access  strategy  is  assumed  as  nonpersistent,  i.e.,  a  host  backs  off  immediately  from  attempting  to  access 
a  resource  and  claims  the  resource  once  it  appears  free  again.  The  vulnerability  period  for  accessing  an  unused 
resource  is  one  propagation  delay,  v  =  t,  within  which  other  hosts  can  cause  a  conflict  (opposite  to  RSF,  where  twice 
the  length  of  the  contention  interval  is  the  average  vulnerability  period);  therefore,  Pg  =  P[0  packets  in  v]  = 

The  average  utilization  period  is  U  =  PgS.  The  average  length  of  a  successful  busy  period  is  simply  7  +  <5  +  2t,  which 
accounts  for  the  delivery  and  processing  of  floor  directives,  the  activity  period,  and  associated  network  latencies. 
The  length  of  an  average  unsuccessful  activity  period  consists  of  one  truncated  activity  lasting  7  sec,  followed  by 
one  or  more  similarly  truncated  activities  sent  within  time  Y  sec,  where  0  <  T  <  r.  The  expected  value  of  Y  is^^ 
Y  =  r  —  j(l  —  6“^’");  therefore,  the  average  duration  of  a  failed  contention  period  is  7  +  2r  —  -^(1  —  e^’").  The  length 
of  the  average  busy  period  is  then  B=^  +  2t— j  +  +  r  +  ;^).  The  average  idle  interval  is  again  /  =  i  +  i. 

Substitution  into  Eq.  1  yields  Eq.  7.  □ 

3.2.  Scheduled  Group  Coordination  Schemes 

In  contrast  to  the  contention  time  in  RGC,  7  in  SGC  denotes  the  time  to  transmit  a  floor  token  to  the  next  host. 
Two  SGC  approaches,  polling,  and  reservation,  have  previously  not  been  used  in  telecollaboration.  Polling  involves 
long  wait  times,  and  reservation  schedules  can  quickly  become  obsolete.  We  focus  on  token  passing,  where  the  floor 
is  being  asked  for,  or  offered  to  hosts  in  a  predefined  service  order.  We  discuss  three  main  cases  of  control  topologies 
for  hosts. 


In  Direct  Coordination  (STD),  each  host  in  a  group  is  fully  connected  to  every  other  host.  STD  improves  the 
response  time,  however,  the  number  of  links  is  and  grows  as  the  square  of  the  number  of  hosts  in  the  session. 

Many  small-scale  commercial  video  conferencing  systems  follow  this  unscalable  model. 

Theorem  5.  The  efficacy  of  STD  without  multicast  support  is 
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Figure  7.  Typical  STD  timeline. 


Proof:  The  timeline  for  STD  is  shown  in  Eigure  7.  The  average  utilization  period  is  U  =  nSPg,  because  floor 
capture  is  perfect  and  any  one  of  the  n  active  hosts  can  acquire  a  floor  of  holding  time  <5  with  success  probability 
Pg  =  ^.  A  floor  may  not  be  released  at  EH,  until  successor  EH’  received  it.  The  control  packet  overhead  is  7  and 
the  propagation  delay  is  r.  Unicasting  a  floor  request  to  the  n  —  1  active  hosts  amounts  to  (n  —  1)(7  +  r  +  e),  plus 
(7  +  T  +  e)  for  a  reply.  The  average  activity  duration  is  A  =  <5  and  may  be  trailed  by  an  idle  interval  consisting  of  a 
period  l  and  an  average  interarrival  time  for  all  hosts,  I  =  l  +  j.  Substitution  into  Eq.  1  results  in  Eq.  8.  □ 

Corollary  3.  The  efficacy  of  STD  with  multicast  support  is 

V^TD  =  C  O/  7  ^777  .X  7T 
0  +  3(7  +  r)  +  (n  -  l)e  +  i  + 

With  multicast  support,  the  request-reply-release  exchange  of  control  packets  takes  a  time  of  3(7  -I-  r)  -I-  (n  —  l)e,  if 
we  assume  that  every  host  incurs  host  processing  overhead  from  its  n  —  1  neighbors. 

In  Ring-based  Coordination  (STR),  a  floor  token  cycles  through  a  logical  ring  arranging  hosts.  Various  systems^®’^^ 
based  on  this  idea  have  been  discussed.  A  host  that  is  ready  to  start  an  activity,  captures  the  passing  token,  inserts 


a  command  sequence  with  address  and  control  information,  sends  the  activity  packets  within  this  turn  period  and 
transfers  the  token  after  completion  to  the  successor  host.  A  host  without  pending  floor  requests  passes  on  the  offered 
token.  Tokens  held  in  excess  time  may  expire  and  incite  automatic  transfer.  The  predefined  token  passing  schedule 
may  not  reflect  spontaneous  interactivity.  The  floor  can  be  granted  ahead  of  its  token  position  to  a  successor  host, 
or  it  may  only  be  acquired  by  a  host  when  the  token  passes  through  that  host.  Likewise,  a  token  can  be  immediately 
released  after  transmission  (RAT),  or  released  after  one  more  reception  (RAR)  at  the  sending  host.  For  efficiency 
reasons  we  focus  on  RAT.  In  our  model,  the  pre-arrival  think  time  /3i  plus  the  token-presence  time  /32  <  Pi  must  be 
smaller  than  the  ring  cycle  time.  If  the  floor  is  taken,  then  P2  ~  (<5  +  r  -I-  7). 

Theorem  6.  The  efficacy  of  STR  without  multicast  support  is 
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Figure  8.  Typical  STR  timeline. 


Proof:  The  timeline  for  STR  is  depicted  in  Figure  8.  We  assume  perfect  floor  capture  and  only  one  host  is  active 
at  any  time.  The  average  utilization  is  U  =  SPg,  with  Pg  as  the  success  probability  that  the  token  is  available  in 
the  period  X  =  Pi  +  P2-  The  probability  that  a  floor  can  be  claimed  by  a  user  is  =  1  —  .  The  token 

cycle  time  is  a  function  of  the  active  host  set  n,  and  with  growing  session  size  it  is  less  likely  that  all  hosts  will 
engage  in  floor  contention.  The  cost  to  transfer  a  floor  token  in  a  cycle  involves  on  the  average  j  hosts,  amounting 
to  f  (7  -I-  T  -I-  e  -I-  P2)  including  processing  overhead  ^e.  The  average  turn  lasts  hence  T  =  ^(7-l-r-l-e-l-  P2)  +  A-\-I. 
The  idle  time  is  again  /  =  t  +  y.  Substituting  U  and  T  into  Eq.  1  yields  Eq.  10.  The  overhead  to  maintain  the  token 
is  not  included  in  this  result.  □ 

Corollary  4.  The  efficacy  of  STR  with  multicast  support  is 


Mc  ^ _ ^(i-e-M/3i+/32)) _ 

f  (r  -1-7-1-  P2)  -b  5(1  -  t  +  X 


(11) 


With  multicasting,  a  token  is  sent  only  once  to  the  network  interface,  but  it  takes  on  the  average  ^  hops  to  cycle 
back  for  another  turn  option,  however,  the  processing  overhead  e  vanishes. 


Tree-based  Coordination  (STT)  allows  for  more  efficient  inter-group  collaboration  and  hierarchical  mixing  of 
media  sources. Its  control  infrastructure  can  work  in  symbiosis  with  reliable  multicasting  over  a  single  shared 
acknowledgment  tree,^^  allowing  more  scalable  and  economic  transmission  of  session  data.  STT  control  messages 
traverse  branches  of  the  tree  in  a  parent-child  relation  reflecting  multicast  group  membership.  Each  host  must  only 
deal  with  messages  from  immediate  neighbor  hosts.  Eloor  directives  can  be  coalesced  into  single  messages,  and  can 
be  forwarded  through  the  tree  in  an  aggregated  fashion.  Single  hosts  will  hence  not  suffer  from  message  implosion,  a 
problem  known  from  reliable  multicast  with  regard  to  acknowledging  received  or  lost  messages.  The  communication 
delay  depends  on  the  height  of  the  control  tree,  rather  than  the  session  size.  The  star  topology  is  a  special  case  of  a 
tree,  which  works  for  small  sessions. 

Theorem  7.  The  efficacy  of  STT  without  multicast  support  is 


VSTT  — 
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Proof:  The  timeline  for  STT  is  depicted  in  Eigure  9.  We  have  again  perfect  floor  capture  due  to  explicit  token 
exchange,  with  U  =  nSPg,  and  Pg  =  ^.  With  the  normalized  average  path  length  P,  the  average  duration  of  the 
floor  capture  period  amounts  to  2P(7  -b  r  -b  e).  Assuming  that  a  host  does  not  know  the  location  of  EH  or  EC, 
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Figure  9.  Typical  STT  timeline. 


exploratory  unicast  of  a  control  message  from  one  host  to  its  parent  and  to  each  of  its  children  would  amount  to 
X  <  {K  +  +  r  +  e)  plus  a  targeted  reply  costing  P{'j  +  r  +  e).  However,  as  outlined  in  Section  4,  hosts  can 

be  tagged  with  unique  labels,  which  allow  for  efficient  absolute  or  relative  routing  of  control  directives  toward  the 
FC  or  FH.  Consequently,  a  control  directive  must  be  sent  only  to  one  neighbor  node  as  the  gateway  on  the  path 
to  FH,  hence  K  =  1.  Signaling  the  conclusion  of  the  activity  period  adds  another  propagation  delay,  A  =  5  +  Pt. 
The  activity  period  may  be  trailed  by  another  idle  period  of  average  length  /  =  t  +  i .  Substituting  into  Eq.  1  gives 
Eq.  12.  □ 

Corollary  5.  The  efficacy  of  STT  with  multicast  support  is 


„MC  _ 
VSTT  — 
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With  multicasting,  a  request-reply  pair  takes  two  7  and  r,  plus  another  r  to  signal  completion  of  the  turn.  A  close 
correlation  between  the  control  tree  of  the  STT  protocol  and  the  end-to-end  multicast  tree  is  assumed.  A  host  sends 
only  one  message  to  the  network  interface,  i.e.,  P  =  1,  K  =  1,  and  e  becomes  negligible. 


3.3.  Results 

We  compare  r]^'^  of  the  discussed  paradigms  in  four  cases:  (1)  a  small  group  in  a  network  with  low  link  latency; 
(2)  a  small  group  in  a  high-latency  network;  (3)  a  large  group  in  a  low-latency  network;  and  (4)  a  large  group  in  a 
high-latency  network.  Group  sizes  are  n  =  5  (small)  and  n  =  300  (large),  corresponding  to  PC-conferencing  systems, 
or  average  MBone  session  size.  Latency  is  indicated  by  r  =  0.005  s  (low),  and  r  =  0.4  s  (high).  The  time  to  sense 
floor  information  is  7'  =  0.25  s.  A  control  packet  of  25  bytes  length  is  measured  with  7  =  e  =  0.02  s.  The  normalized 
activity  time  is  5  =  1  and  token-ring  “think  times”  are  set  to  /3i  =  |  s  and  f32  =  -^  s.  The  typical  idle  time  is  chosen 
as  t  =  I  s.  Eigures  11  and  10  plots  the  resulting  efficacy. 


Comparison  for  small  sessions  /  low  latency  Comparison  for  large  sessions  /  low  latency 


Figure  10.  Efficacy  with  multicast  support  for  low  network  latency. 

The  efficacy  of  RSI  is  below  20%  in  all  four  scenarios.  Systems  employing  RSE  benefit  from  coordination  attempts 
and  improve  slightly  over  RSI,  approximating  25%  efficacy  in  networks  with  faster  links.  Both  schemes  are  instable 
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Figure  11.  Efficacy  with  with  multicast  support  for  high  network  latency. 

even  at  low  request  rates  and  collaboratory  efficacy  falls  off  quickly  and  the  average  delay  becomes  unbounded. 
This  models  the  phenomenon  that  users  avoid  frequent  turn-taking  in  telecollaboration. RAS  was  a  first  attempt 
toward  unintrusive  machine-assisted  floor  control  and  performs  very  well  for  local  area  networks,  where  the  system’s 
responsiveness  warrants  quick  updates  and  steady  resource  tapping.  For  high  link  latencies  it  provides  only  slightly 
higher  loads  than  RSF,  however,  it  also  degenerates  quickly  and  rises  barely  above  20%.  STD  performs  well  for 
small  groups,  where  a  host  needs  only  to  send  few  packets  to  the  session  remainder,  but  despite  stability  for  higher 
loads  it  barely  exceeds  15%  efficacy  in  large  sessions.  In  its  best  case  of  small  sessions  and  low  network  latency, 
STR  approximates  65%  efficacy  and  it  is  generally  stable,  but  degrades  with  rising  scale  and  delay.  In  particular, 
it  collapses  for  large  sessions  with  high  latency.  STT  shows  the  best  overall  behavior,  both  in  terms  of  scalability 
and  stability.  It  reaches  up  to  80%  efficacy  in  low  latency  networks,  and  about  40%  efficacy  for  high  link  latencies, 
independent  of  session  size.  According  to  these  results,  STT  fills  a  gap  and  is  particularly  well-suited  for  Internet 
collaboration  in  large  groups. 

4.  A  TREE-BASED  FLOOR  CONTROL  PROTOCOL 

The  protocol  outlined  in  this  section  presents  two  innovations:  floor  control  is  inherently  hierarchical,  and  the  control 
tree  for  group  coordination  is  correlated  in  its  operation  to  an  underlying  tree-based  multicast  service,  as  outlined 
with  the  Lorax  protocol. In  this  protocol,  a  single  shared  acknowledgment  (ack)  tree  per  session  is  used  to 
disseminate  information  reliably  among  hosts  on  the  tree.  Hierarchical  acknowledgments  (hacks)  are  used  to  avoid 
the  situation  that  the  source  is  contacted  by  all  hosts  requesting  retransmissions  of  lost  packets.  Lorax  introduces 
recursive  top-down  labeling  of  tree  nodes  such  that  a  label  l{x)  of  node  x  at  level  h  in  the  tree  is  a  prefix  of  its 
children’s  labels  at  level  h  +  1.  These  address  labels  can  be  used  for  absolute  or  relative  self-routing  of  packets 
within  a  session  and  its  multicast  groups.  Adding  a  node  in  the  tree  involves  only  the  new  node  as  a  child  and  its 
parent,  while  deletions  require  relabeling  of  the  subtree  of  the  deleted  node.  The  label  cardinality  depends  on  the 
tree  branching  factor  and  the  session  size. 

The  floor  control  tree  mirrors  the  Lorax  hack  tree,  sparing  the  floor  control  protocol  from  managing  its  own 
dissemination  infrastructure,  however,  the  propagation  mechanisms  is  geared  toward  cascaded  processing  of  control 
directives  (CDs)  for  resource  sharing,  rather  than  recovery  of  lost  transmissions.  A  CD  contains  label  information 
on  senders,  receivers,  a  timestamp,  time-to-live  field  TTL  indicating  scope  and  persistence  of  the  CD,  a  privacy  level 
indicator,  and  floor  descriptor  including  FH,  priority,  resource  and  current  state.  Standard  CDs  are  REQUEST,  GRANT, 
DENY,  RELEASE,  or  STATE  UPDATE.  The  current  floor  state  is  tracked  distributedly,  and  hosts  on  the  tree  store  the  label 
of  FH  for  each  floor  locally  for  efficient  addressing.  Communication  about  the  floor  state  is  retained  in  local  groups, 
and  CDs  of  a  node  x  are  aggregated  and  responded  to  by  its  immediate  neighbors,  if  these  nodes  can  provide  an 
up-to-date  response  to  a  query.  Call  setup,  late  joining  and  withdrawal  from  a  session  are  handled  by  a  membership 


protocol  interfacing  with  the  protocol.  Hosts  in  a  session  can  assume  one  or  more  of  the  following  control  roles  in 
the  tree:  a  coordinator  node  hosts  the  FH  for  a  resource  r;  relay  nodes  collect  CDs  from  their  children,  forward  them 
in  the  tree  towards  the  FH,  and  relay  replies  back  to  their  children;  and  leaf  nodes  delimit  tree  branches,  comparing 
on  the  average  more  bits  in  the  control  routing  procedure,  than  nodes  closer  to  the  root.  Address  labels  allow  to 
maintain  only  one  logical  control  tree  for  all  floors  in  a  session,  instead  having  to  maintain  separate  trees,  one  per 
floor. 

The  protocol  operates  in  a  setup,  active  and  teardown  phase.  Setup  is  initiated  at  session  start  or  when  a  node  x 
joins  a  session.  Floor  state  information  relevant  for  locally  shared  resources  is  retrieved  by  x  from  its  neighbor  nodes 
and  new  resources  are  advertised  to  the  session.  The  active  phase  concerns  aggregation  and  self-routing  of  floor 
information  in  local  groups.  A  request  is  sent  to  FH  by  comparing  prefix(l(x))  with  prefix(l(FH)),  and  transmitted 
by  relay  nodes  on  the  path  to  FH.  When  the  floor  is  granted  to  x,  all  hosts  in  the  session  receive  a  multicast  update 
on  1{FH)  =  l{x)  and  submit  their  directives  to  x  for  a  new  turn.  A  floor  transfer  between  FH  and  its  successor 
FH’  must  be  confirmed,  or  will  not  be  enacted.  In  teardown,  which  is  initiated  when  a  session  terminates,  a  host 
withdraws  or  fails,  floor  state  information  is  retained  in  a  log  file,  but  deleted  from  records  in  active  session  hosts. 


FH  FH 


CD  response 


Figure  12.  Sample  HGCP  scenario. 

Figure  12  illustrates  the  operation  of  the  protocol,  indicating  collocation  of  floor  control  with  session  control,  and 
interfacing  with  reliable  multicast  below  and  applications  above.  Hollow  arrowheads  indicate  data  multicasting,  and 
the  other  arrows  indicate  CDs.  In  scenario  12(a),  initially  1{FH)  =  l{a)  =  1,  which  operates  on  a  resource  r  and 
transmits  updates  to  selected  session  members.  Nodes  x  and  e  contend  for  the  floor  of  a  shared  resource  r.  Looking 
up  the  FH  entry  for  the  resource  in  question,  they  send  a  REQUEST  CD  to  their  parent,  because  1{FH)  =  prefix{l{e)), 
and  1{FH)  =  prefix{l{x)).  For  the  parent  node  c  of  e,  1{FH)  =  prefix{l{c)),  and  similarly,  1{FH)  =  prefix{l{b)). 
Hence,  both  requests  cascade  upward  in  the  tree  toward  the  root.  Node  b  is  relay  node  for  x  and  e  and  either  already 
forwarded  the  request  from  x  to  a,  if  CD(e)  arrives  after  CD(x),  or  it  aggregates  both  CDs  into  one  request,  and 
propagates  it  to  a.  Assume  that  a  satisfies  the  request  from  x  first,  sending  a  GRANT  CD  across  b  to  x.  Once  x 
confirms  reception  of  the  floor,  node  a  multicasts  an  update  with  label  information  1{FH')  =  l{x)  =  100  to  the 
remainder  of  the  session,  indicating  the  start  of  a  new  turn.  In  scenario  12(b),  x  is  the  new  FH’,  starts  using  r 
and  multicasts  updates  to  its  children.  All  hosts  in  the  multicast  group  propagate  their  CDs  towards  the  location 
of  FH’,  as  indicated  in  scenario  12(c),  where  node  y  dropped  out  of  the  session  and  all  its  shared  resources  r,,  with 
FO{ri)  =  y,  are  withdrawn  and  floor  states  tables  across  the  session  are  marked  up  accordingly.  A  new  node  2:  in 
subtree  E  also  joins  the  session  and  sends  a  STATE  CD  to  its  parent,  retrieving  the  current  state  table  for  its  local 
resources  shared  with  the  session.  A  more  detailed  efficacy  analysis  and  protocol  description  can  be  found  in  Ref.  23. 

5.  CONCLUSION 

System  support  for  improved  telepresence  and  group  coordination  in  collaborative  multimedia  systems  is  still  in  its 
infancy.  This  paper  focused  on  a  comparative  analysis  of  floor  control  protocols,  merging  time  aspects  of  protocol 
operations  with  end-user  behavior  and  thus  accounting  for  both  internal  and  external  factors  regarding  control  of 
shared  resources.  This  is  to  our  knowledge  the  first  attempt  to  quantify  the  effectiveness  of  various  floor  control 
mechanisms.  To  keep  our  bare-bones  analysis  tractable,  we  needed  to  make  several  strong  assumptions  to  unify 
various  strategies  of  floor  management  in  one  framework.  A  basic  mechanism  for  tree-based  floor  control  has  been 
outlined,  which  operates  in  close  correlation  to  a  reliable  multicasting  protocol.  Our  conjecture  is  that  hierarchical 
floor  control  is  more  scalable  and  efficient  than  previous  paradigms  of  group  coordination. 
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